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1.   INTRODUCTION 

Consider  a  game  in  which  a  single  long-run  player  faces  an  infinite 
sequence  of  opponents,  each  of  whom  play  only  once.   While  such  a  game  will 
often  have  multiple  equilibria,  a  common  intuition  is  that  the  "most 
reasonable"  equilibrium  is  the  one  which  the  long-run  player  most  prefers. 
This  paper  shows  that  "reputation  effects"  provides  a  foundation  for  that 
intuition,  and  it  also  identifies  an  important  way  in  which  the  intuition 
must  be  qualified. 

More  specifically,  imagine  that  players  move  simultaneously  in  each 
period,  and  let  the  "Stackelberg  outcome"  be  the  long-run  player's  most 
preferred  outcome  of  the  stage  game  under  the  constraint  that  each  short-run 
player  chooses  an  action  that  maximizes  his  single-period  payoff.   Now 
formulate  the  situation  as  a  game  of  incomplete  information,  and  imagine 
that  with  non-zero  probability  the  long-run  player  is  a  "type"  who  always 
plays  his  Stackelberg  action.   When  the  discount  factor  is  sufficiently  near 
to  one,  any  Nash  equilibrium  must  give  the  long-run  player  almost  his 
Stackelberg  payoff.   The  intuition  is  the  familiar  one  that  the  long-run 
player  can  choose  to  play  as  if  he  were  the  type  that  always  plays 
Stackelberg,  and  can  thus  acquire  the  "reputation"  for  being  a  Stackelberg 
type.   This  intuition  relies  on  the  assumption  that  the  short-run  players  in 
fact  observe  the  long-run  player's  strategy  in  the  stage  games,  and  need  not 
hold  in  sequential-move  games  where  some  actions  by  the  short-run  player  may 
prevent  the  long-run  player  from  acting  at  all. 

Our  work  builds  on  that  of  several  previous  authors,  most  directly  that 
of  Kreps -Wilson  [1982],  Milgrom-Roberts  [1982],  and  Fudenberg-Kreps  [1987] 
on  reputation  effects  in  the  chain- store  paradox.   These  papers  considered  a 
long-lived  incumbent  facing  a  succession  of  short-lived  entrants,  and  showed 


that  if  there  was  a  small  chance  that  the  incumbent  was  "tough,"  it  could 

deter  entry  by  maintaining  a  reputation  for  toughness.   Our  result  improves 

on  theirs  in  several  ways,  all  of  which  stem  from  the  fact  that  our  results 

apply  to  all  of  the  Nash  equilibria  of  the  repeated  game.   First,  our 

results  are  robust  to  further  small  changes  in  the  information  structure  of 

the  game.   The  earlier  arguments  depend  on  the  restriction  to  sequential 

i 
equilibria,  which  as  Fudenberg-Kreps-Levine  [1987]  have  argued,  is  not 

robust  to  such  changes. 

Second,  our  proof  is  much  simpler,  and  provides  a  clearer  understanding 
of  the  reputation-effects  phenomenon.   The  point  is  simply  that  since  the 
short-run  players  are  myopic,  they  will  play  as  Stackelberg  followers  in  any 
period  they  attach  a  large  probability  to  the  long-run  player  playing  like  a 
Stackelberg  leader.   We  use  this  observation  to  show  that  if  the  long-run 
player  chooses  his  Stackelberg  action  in  every  period,  there  is  an  upper 
bound  on  the  number  of  times  the  short-run  players  can  fail  to  play  as 
Stackelberg  followers.   This  argument  is  much  simpler  than  the  earlier  ones, 
which  were  obtained  by  characterizing  the  sequential  equilibria.   (The 
earlier  papers  did  however,  obtain  characterizations  of  equilibrium  play  as 
well  as  of  the  equilibrium  payoffs.) 

Third,  because  our  proof  is  simpler,  we  are  able  to  study  a  broader 
class  of  games.   We  consider  arbitrary  specifications  of  the  stage  game,  as 
opposed  to  the  special  case  of  the  chain  store,  and  we  consider  a  more 
general  form  of  the  incomplete  information:   Where  the  earlier  papers 
specified  that  the  long-run  player  had  two  or  three  types,  our  result  covers 
all  games  in  which  the  "Stackelberg  type"  has  positive  probability.   Also, 
our  results  extend  to  non- stationary  games  in  which  the  long-run  player  has 
private  information  about  his  payoffs  in  addition  to  knowing  whether  or  not 


he  is  a  "Stackelberg  type." 

Our  work  is  also  related  to  that  of  Kreps-Milgrom-Roberts -Wilson  [1982] 
and  Fudenberg-Maskin  [1986]  on  reputation  effects  in  games  where  all  of  the 
players  are  long-lived,  and,  more  closely,  to  that  of  Aumann-Sorin  [1987]. 
Kreps-Milgrom-Roberts -Wilson  considered  the  effects  of  a  specific  sort  of 
incomplete  information  in  the  finitely- repeated  Prisoner's  Dilemma,  and 
showed  that  in  all  sequential  equilibria  the  players  cooperated  in  all  but  a 
few  of  the  periods.   Fudenberg-Maskin  showed  that  for  any  given  individually 
rational  payoffs  of  a  repeated  game  there  is  a  family  of  "nearby"  games  of 
incomplete  information  each  of  which  has  a  sequential  equilibrium  which 
approximates  the  given  payoffs.   Our  result  does  not  apply  because  long-run 
players  need  not  play  short-run  best  responses.   They  can  tradeoff  a  loss 
today  for  a  gain  tomorrow.   This  is  not  the  case  with  a  single  long-run 
player,  which  is  why  this  paper's  results  are  so  different.   The  work  of 
Aumann-Sorin  is  closer  to  ours  in  developing  bounds  on  payoffs  that  hold 
uniformly  whenever  the  incomplete  information  puts  probability  on  a  suffi- 
ciently broad  class  of  preferences.   To  obtain  this  sort  of  result  with 
several  long-run  players,  Aumann-Sorin  require  very  strong  assumptions:   the 
only  stage  games  considered  are  those  of  "pure  coordination,"  and  the 
preferences  of  the  "crazy"  types  must  be  represented  by  "finite  automata 
with  finite  memory." 

2.   THE  SIMPLE  MODEL 

We  begin  with  the  simplest  model  of  a  long  run  player  facing  a  sequence 
of  opponents.   The  long-run  player,  player  one,  faces  an  infinite  sequence 
of  different  short-lived  player  two's.   Each  period,  player  one  selects 
strategy  from  his  strategy  set   S. ,   while  that  period's  player  two  selects 


a  strategy  from  S„.   In  this  section  we  assume  that  players  one  and  two 
move  simultaneously  in  each  period,  so  that  at  the  end  of  the  period  each 
player  knows  what  strategy  his  opponent  used  during  that  period.   Section 
five  considers  the  complications  that  arise  when  the  stage  game  has  a 
nontrivial  extensive  form.   For  the  time  being  we  will  also  assume  that  the 

S.   are  finite  sets;  Section  six  considers  the  technical  issues  involved 

1 

when  the   S    are  allowed  to  be  any  compact  metric  space.   Corresponding  to 
the  strategy  spaces   S.   are  the  spaces   2!   of  mixed  strategies. 
The  unperturbed  stape  game  is  a  map 

g:  Sx  x  S2  -  IR2, 

which  gives  each  player  i's  payoff   g.   as  a  function  of  the  realized 
actions.   In  an  abuse  of  notation,  we  let  g(o)    —   g(o^,c„)      denote  the 
expected  payoff  corresponding  to  the  mixed  strategy  a. 

In  the  unperturbed  repeated  game  G(S) ,  the  long-run  player  discounts 
his  expected  payoffs  using  the  discount  factor  6,  0  <  6  <  1 .  Specifical- 
ly, the  sequence  of  payoffs   gn  , . . . , g  . . .   has  present  value 


l,V 


(1-5) 

t-1 

Each  period's  short-run  player  acts  to  maximize  that  period's  payoff. 

Both  long-run  and  short-run  players  can  observe  and  condition  their 
play  at  time  t  on  the  entire  past  history  of  the  game.   Let   H   denote  the 
set  of  possible  histories  of  the  game  through  time   t,   H  -  (S   x  S  -)  .   A 
pure  strategy  for  player  one  is  a  sequence  of  maps   s. :  H    -+  S. ,   and  a 
pure  strategy  for  a  period- t  player  two  is  a  function   s  :  H   .  -►  S_.   Mixed 
strategies  are  a.:      H   .  -*  2. .   (Note  that  if  the  stage-game  corresponded 
to  a  non-trivial  extensive  form,  then  the  realized  play  in  the  stage  game 


would  not  reveal  how  player  one  would  have  played  at  all  of  his  information 
sets,  and  thus  would  not  reveal  player  one's  choice  of  a  normal -form 
strategy  for  that  stage.) 

This  game  has  been  studied  by  Fudenberg-Kreps-Maskin  [1987].   We 
summarize  their  results  here  for  the  convenience  of  the  reader. 

Let  B:  S1  -*  2L   be  the  correspondence  that  maps  mixed  strategies  by 
player  one  in  the  stage  game   g   to  the  best  responses  of  player  two. 
Because  the  short-run  players  play  only  once,  in  any  equilibrium  of   G(6), 
each  period's  play  must  lie  in  the  graph  of  B. 

Fudenberg-Kreps-Maskin  prove  that  a  kind  of  "folk  theorem"  obtains  for 
games  with  a  single  long-run  player.   Specifically,  let  V   be  the  set  of 
payoffs  for  player  one  attainable  in  the  graph  of  B  when  player  one  is 
restricted  to  pure  strategies.   Let  v   be  player  one's  minimax  value  in 
the  game  in  which  moves  by  player  two  that  are  not  best  responses  to  some 
play  by  player  one  are  deleted.   Then  any  payoff  in  V   that  gives  player 
one  at  least  v   can  be  attained  in  a  sequential  equilibrium  if  6      is  near 
enough  to  one . 

The  point  of  this  paper  is  to  argue  that  if  the  game   G(5)   is 
perturbed  to  allow  for  a  small  amount  of  incomplete  information,  then  there 
is  a  far  narrower  range  of  Nash  equilibrium  payoffs.   Roughly  speaking,  in 
the  perturbed  game  the  long-run  player  can  exploit  the  possibility  of  build- 
ing a  reputation  to  pick  out  the  equilibrium  he  likes  best. 

Specifically,  define 

(1)  g*  -  max    min    g  (s  ,a    ), 

s1es1  a2eB(Sl) 


and  let   s*   satisfy 


min    g. (s*a)    -   g* 
^2eB(a*) 

We  call   g*   player  one's  Stackelberg  payoff  and   s*   a  Stackelberg 
strategy.   This  differs  slightly  from  the  usual  formulation,  because  when 
player  two  has  several  best  responses  we  choose  the  best  response  that 
player  one  likes  least,  instead  of  the  one  that  he  prefers.   In  the  usual 
definition  of  Stackelberg  equilibrium,  the  follower  is  assumed  to  play  the 
best  response  that  the  leader  most  prefers.   This  is  natural  in  many  games 
with  a  continuum  of  strategies,  because  the  leader  can  make  the  follower 
strictly  prefer  the  desired  response  by  making  a  small  change  in  his 
strategy.   In  our  finite-action  setting,  player  one  cannot  break  player 
two's  indifference  in  this  way.   Thus  to  have  a  lower  bound  on  player  one's 
payoff,  we  need  to  allow  for  the  case  where  player  two  is  "spiteful"  and 
chooses  the  best  response  that  player  prefers  least.   Note  also  that  there 
may  be  several  Stackelberg  strategies.   In  the  next  section  we  provide  a 
condition  on  the  form  of  the  incomplete- information  that  ensures  that  player 
one's  worst  Nash  equilibrium  payoff  is  close  to   g*  when  6      is  close  to 
one . 

Before  turning  to  the  incomplete- information  game,  though,  we  should 
clarify  the  role  of  mixed  strategies  for  the  long-run  player.   The 
Fudenberg-Kreps-Maskin  "folk  theorem"  restricts  the  long-run  player  to  play 
pure  strategies,  as  does  our  definition  of   g*   Consider  Figure  1,  which 
displays  an  example  from  Fudenberg-Kreps-Maskin.   In  this  game,  the  long- 
run  player  chooses  rows  and  the  short -run  player  chooses  columns.   If  player 
one  mixes  with  equal  weight  on  both  rows,  then  a  best  response  for  player 
two  is  to  play  L,   giving  player  one  a  payoff  of  four.   On  the  other  hand, 
restricting  player  one  to  pure  strategies  yields   g*  -  1 . 


Fudenberg-Kreps-Maskin  show  that  in  any  equilibrium  of  the  repeated  game, 
whether  or  not  mixed  strategies  are  allowed,  the  player  one  payoff  is  no 
more  than  three. 

The  problem  is  that  if  the  short- run  players  believe  that  player  one 
will  mix  with  equal  probabilities  each  period,  player  one  will  always  prefer 
to  play  up  instead  of  down.   If  the  long-run  player  could  arrange  for  his 
choice  of  mixed  strategy  to  be  observed  at  the  end  of  each  period,  then  the 
payoff  of  four  could  be  attained.   Having  an  observable  mixed  strategy  is 
equivalent  to  adding  a  new  pure  strategy  with  the  same  payoffs  as  the  pure 
strategy.   Thus,  when  mixed  strategies  are  observable,  we  have 
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When  might  it  be  reasonable  to  suppose  that  mixed  strategies  are 
observable?   Imagine  that  each  player  can  build  an  infinitely- divisible 
lottery  wheel  with  a  known,  independent  probability  distribution  over 
outcomes.   All  wheels  are  spun  in  each  period.   Further,  each  player 
observes  his  wheel  at  the  start  of  each  period  (before  he  chooses  an 
action),  and  his  opponents  observe  that  period's  spin  at  the  end  of  the 
period.   (This  requires  that  the  player  cannot  secretly  alter  the  reading  of 


his  wheel.)  Then,  since  any  mixed  strategy  can  be  implemented  by  condition- 
ing on  the  wheel's  outcome,  the  player's  choice  of  mixed  strategy  is  ex-post 
observable . 

A  striking  fact,  which  we  demonstrate  in  another  paper,  Fudenberg  and 
Levine  [1987b],  is  that  with  the  type  of  perturbation  used  in  this  paper, 
the  long  run  player  can  do  approximately  as  well  as  he  can  with  observable 
mixed  strategies,  even  though  mixed  strategies  cannot  be  observed.   That  is, 
if  it  is  possible  to  build  a  reputation  for  playing  a  mixed  strategy,  the 
long-run  player  can  approximate  the  corresponding  payoff  when  he  is 
sufficiently  patient. 

3.   THE  PERTURBED  GAME 

This  section  introduces  the  perturbed  game  and  gives  the  first  version 
of  our  bounds  on  the  long-run  player's  Nash  equilibrium  payoff.   Section 
four  gives  examples  to  show  that  this  bound  cannot  in  general  be  improved 
on,  and  that  there  are  generally  many  Nash  equilibria. 

In  the  perturbed  game,  player  one  knows  his  own  payoff  function,  but 
the  short-run  players  do  not.   We  represent  their  uncertainty  about  player 
one's  payoffs  using  Harsanyi's  [1967]  notion  of  a  game  of  incomplete 
information.   Player  one's  payoff  is  identified  with  his  "type"   w  €  Q.      It 
is  common  knowledge  that  the  short-run  players  have  (identical)  prior 
beliefs  y.      about  u>,   represented  by  a  probability  measure  on  f).   In  this 
section  we  restrict  attention  to  perturbed  games  with  a  countable  number  of 
types,  so  that   Q  -  {u>      u>     u>      .  .  .  }   is  a  measure  space  in  the  obvious  way. 
Section  six  allows  for  an  uncountable  number  of  types. 

The  period  payoffs  in  the  perturbed  game  are  the  same  as  in  the 
unperturbed  game,  except  that  player  one's  period  t  payoff   g  (s   s  ,w)   may 


now  depend  on  his  type.  A  pure  strategy  for  player  one  in  the  perturbed 
game  is  a  sequence  of  maps  s  :  H  .  x  n  -»  S  specifying  his  play  as  a 
function  of  the  history  and  his  type;  a  mixed  strategy  is 

a    :   H   .  X  0  -*   S. .  Otherwise  the  definition  of  the  perturbed  game  is  the 
same  as  that  of  the  unperturbed  game.   The  perturbed  game  is  denoted   G(S,/i) 
to  emphasize  its  dependence  on  the  long-run  player's  discount  factor  and  on 
the  beliefs  of  the  short-run  players. 

Let  type   u>   have  the  preferences  corresponding  to  the  unperturbed 
game  G(S) ,   so  that  if  player  one  is  type   w    then 

g1(s1,s2,c0)  -  g1(s1,s2). 

In  some  cases  it  will  be  most  natural  for  A<(wn)   to  be  near  to  unity,  in 
others  it  will  not;  this  is  inessential  for  our  argument.   We  will  however 
require   A'C'^n)   to  De  strictly  positive.   In  addition,  for  any  action 
s.  £  S1 ,   let  the  event   s1   be  a  type  u     such  that  player  one's  best 
strategy  in  the  repeated  game  is  to  play   s..   in  every  period,  that  is, 

g1(s1,s2,s1)  -  g1(s1,s2,s1)  >  g1(s1,s2,s1) 
for  all   s  *   s    and   s„,  s_. 

In  other  words,  player  one's  payoff  is  independent  of  player  two's  action  if 
he  plays   s  ,   and  is  strictly  more  than  he  can  get  if  he  plays  any  other 
strategy.   This  clearly  implies  that  playing   s.   in  every  period  is 
strictly  dominant  in  the  repeated  game.   If   s   was  merely  dominant  in  the 
stage  game,  it  would  not  necessarily  dominate  in  the  repeated  game.   In  the 
prisoner's  dilemma,  for  example,  defection  is  dominant  at  each  stage,  but 
certainly  is  not  a  good  strategy  against  a  tit-for-tat  opponent.   Clearly  if 
repeated  play  of  s.       is  strictly  dominant  in  the  repeated  game,  Nash 
equilibrium  requires  that  (if   s..   and  h   have  positive  probability) 


10 


s.   (h  ,£-,)  —  s..   for  all   t   and   almost  h  .   The  event   s*   means  that 
player  one  strictly  prefers  to  play  the  "Stackelberg  strategy"   s*   We  will 
say  that  this  event  corresponds  to  player  one  being  "the"  Stackelberg  type. 

Since  the  perturbed  game  has  countably  many  types  and  periods,  and 
finitely  many  actions  per  type  and  period,  the  set  of  Nash  equilibria  is  a 
closed  non-empty  set.   This  follows  from  the  standard  results  on  the 
existence  of  mixed  strategy  equilibria  in  finite  games,  and  the  limiting 
results  of  Fudenberg  and  Levine  [1983,  1986].   Consequently,  we  may  define 
V  (6,fi,u> n)   to  be  the  least  payoff  to  a  player  one  of  type   w   in  any  Nash 
equilibrium  of  the  perturbed  game  C(6,/j).      Observe  that  the  minimum  is 
taken  over  all  mixed  strategy  equilibria,  and  not  merely  pure  strategy 
equilibria. 

Theorem  1 :   Assume  vi^n)    >   0,   and  that  n(st)    E  p*  >  0 .   Then  there  is  a 
constant  k(p*)  otherwise  independent  of   (n,/j),   such  that 

V^S.h.Uq)      >   5k(/J*}  g*  +  (l-^^min  B]_ 

This  says  that  if  the  long-run  player  is  patient  relative  to  the  prior 
probability  p*   that  he  is  "tough",  then  the  long-run  player  can  achieve 
almost  his  Stackelberg  payoff.   Moreover,  the  lower  bound  on  the  long-run 
player's  payoff  is  independent  of  the  preferences  of  the  other  types  in  fi   to 
which  p.      assigns  positive  probability.   The  condition  /j*(uO  >  0   is 
necessary  for  V   to  be  well  defined.   We  point  out  in  section  six  that  this 
condition  is  not  in  fact  essential. 

Proof;   Fix  any  (possibly  mixed)  equilibrium   (a. ,ct_)   of   G(S,p),   and 
consider  the  strategy  for  player  one  of  always  playing   s*.   We  will  show  that 
player  two's  equilibrium  strategies  choose  actions  outside  of  B(s*)   at  most 
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k(/j*)   times,   where   k(/j*)   is  independent  of  6      and  n . 

Fix  a  history  h   which  occurs  with  positive  probability,  and  such  that 
player  one  has  always  played  s*   Since   /j*  >  0   such  histories  exist.   We 
shall  show  that  in  any  such  history  player  two  has  played  outside  of  B(s*) 
at  most   k   times.   To  this  end,  for   1  <  r  <  t,   let  h   be  the  history  h 

T  t 

truncated  at  time   r .   Define 

i 

w*(h  . )  ■  Prob  [s*  -  s*  ]h  . ] 

TT-1  1       l'f-1 

to  be  the  probability  that  (any  type  of)  player  one  plays   s*   in  period  r 
conditional  on  h      Since   B(s*)   is  the  set  of  best  responses  to   s* 
there  is  a  probability  7r  <  1   such  that  player  two  will  play  an  action  in 
B(s*)  whenever   ?r*(h   n  )  >  ir .   Let   0*  -  C/s*.   We  show  below  that 

1  TT-1  '1 

?r*(h   .  )  <  w   is  only  possible  if  Probfs,  -  s*  |h   ,  ,0*1   is  less  than  -k      as 
TT-1  J  ll     I't-1 

well.   Thus  in  any  period  where  a  player  2  does  not  play  a  best  response  to 

s*   he  must  expect  that  the  "non-Stackelberg"  types  are  unlikely  to  play  si- . 

Consequently,  by  playing   s*   player  one  can  affect  a  non-negligible  increase 

in  his  opponent's  belief  that  he  is  indeed  the  Stackelberg  type. 

To  make  this  precise,  suppose  that  in  h    there  have  been  k  previous 

periods   ret   in  which   7r*  ,  (h  )  <  jr.   Let  h  -  (s*, s*  .  .  .  ,s*)   be  the 

t+1v  r  til'      1 

2 
history  of  player  one's  play,  and  h   be  the  history  of  player  two's  play;  in 

1   2 
other  words   h  -  (h^,h  ).   Using  Bayes  Law 

M*  Prob (h^  |s*) 
(3)  Prob(s>  |h  ) 


H*   Prob(h2  |s*)  +  (l-/i*)  Prob(h   |fl*) 


Our  goal  is  to  show  that 

(A)  Prob(ht|Q*)  <  Prob(hJ  |s*)  wk; 

that  is,  the  prior  probability  of  a  "rational"  player  one  playing   s*  many 
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times  in  a  row  must  be  small  if  player  two  actually  failed  to  play  a  best 
respond  to   s*  k  many  times  in  a  row.   The  inequalities  (3)  and  (4) 
clearly  imply  that 

(5)  Probfs*  |ht]  £  (i*/[»*+a-»*)    *k}- 
Direct  computation  then  shows  that  if 

(6)  £  >  k  -  log[/**(l-*)/(l-ji*)]/log», 

then   Prob[s.    -  s*  |h  ]  £  Prob[s*  |h  ]  >  »r.   In  other  words,  if  player  one 
has  always  played   s*   and  there  have  been  k  periods  in  which  this  was 
unexpected  in  the  sense  that   rr*   (h  )  <  w ,   then  player  two's  posterior 
belief  that  player  one  will  play   s*   Prob[s..  '  -  s*  |h  ]  ,   exceeds  n, 
and  player  two  will  optimally  respond  by  playing  in   B(s*) .   Consequently, 
by  induction,  if  player  one  plays   s*   forever,  he  will  get  less  than   g* 
at  most   k   times . 

To  prove  (A) ,  observe  that 

(7)  Prob(h   |n*)  -  ProbCh1, h2  |fl*)  - 

Prob(h2  Ih1,^*)  ProbCh1  In*). 
t  i  t>  i  t  i 

Moreover,  given  h  ,   player  two's  play  does  not  further  depend  on  player 

2    1  2  i  1  - 

one's  type,  and  it  follows  that   Prob(h   h  ,0*)  -  Prob(h   |h  ,s*).   Further 

t    t  t    t   1 

since   h  -  (s* , s*, . . . , s*) ,   Prob(h   Is*)  -  1,   and  so 
t     11'      1  t  '  1 

2  i  1  -  2  i  - 

Prob(h   |h  ,s*)  -  Prob(h   | s*) .   Consequently,  to  demonstrate  (A),  it 

suffices  to  find  a  bound  on  Prob(h   |0*) .   We  calculate 

(8)  ProbCh1  In*)  -  ProbCh1, h1  , h1  In*) 

t  '  '  t   t-1      1  ' 

t        1    \ 
-  n  ProbCh   |h    ,n*)  . 

r-1       T 

In  periods  r  outside  of  t,   when  player  two  plays  in  B(s*)  , 


13 


Prob(h   |h   ..  ,0*)  ^  1.   In  periods   ret,   we  have  from  the  elementary  laws 

T     T  -  1 

of  probability 

(9)  ,r*(hr-l)  "  ProbK  "  £l  lhT-l] 

-  Prob[s*  In  . ]  +  Prob[s^  -  s*  In  .,0*1  Prob[n*  |h  . ]. 
1  '  r - 1 J  1     l'r-1  '  r-1 

Moreover,   ?r*(h   .)  <  7r,   since   ret.   From  (9)  this  yields 

Prob(s   -  s*  |h    ,Q*)  <  7r.   Finally,  observe  that  h  ,   consisting  of   s* 

played   r   times  in  a  row,  occurs  following  h   .  ,   consisting  of   s* 

played   r-1   times  in  a  row,  if  and  only  if   s.  -  s*   Consequently 

(10)  ProbCh1  Ih1  .  ,0*)  -  Prob(s!"  -  s*    |h   .  ,0*)  <  n      ret. 

TT-1  1       XT-X 

Combining  this  with  (7)  and  (8)  yields  (4).  I 

Note  that  the  same  proof  holds  immediately  for  finitely-repeated  games, 
including  the  case  where   5—1:   If  there  are  enough  periods,  player  one's 
average  payoff  cannot  be  much  below  the  Stackelberg  level. 

While  we  have  assumed  that  player  two's  payoffs  are  common  knowledge,  a 
simple  extension  allows  us  to  interpret  each  player  two's  choice  of   s„   as 
a  choice  of  a  strategy  mapping  his  privately -known  type  into  an  action. 
Under  this  interpretation,  player  one's  Stackelberg  strategy   s*   is  the  one 
that  maximizes  player  one's  expected  payoff,  given  that  each  type  of  player 
two  chooses  a  short-run  best  response.   In  the  chain-store  game,  for 
example,  if  there  is  a  sufficiently  high  probability  that  player  two  is  a 
type  which  will  enter  whether  or  not  player  one  is  expected  to  fight,  then 
player  one's  Stackelberg  action  is  to  acquiesce.   (See  Milgrom-Roberts 
[1982]  and  Fudenberg-Kreps  [1987].) 

It  is  also  unimportant  that  player  one's  payoff  be  common  knowledge  in 
the  unperturbed  game.   Let  player  one's  possible  types  in  the  unperturbed 
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game  be  6    G  6,   with   g..  -  g  (s   s   0)   and   g    independent  of  8.      Now 
construct  a  perturbed  game  with  type  space   Q,   and  assume  that   9  C  n.   For 
each  6,      let   s*(#)   be  that  type's  Stackelberg  action,  that  is,  its  most 
preferred  action  in  the  graph  of  B,   and  assume  that  each  of  the  events 
s*(6)      has  positive  prior  probability.   The  proof  of  theorem  1  shows  that 
each  6      can  attain  his  Stackelberg  payoff  gt(B)      when  6      is  near  to  one. 
The  results  given  in  section  six  will1  cover  this  extension  along  with 
several  others. 

For  notational  reasons ,  we  assumed  that  types  in  the  perturbed  game 
have  stationary  payoffs  independent  of  time  and  history.   As  can  be  seen 
from  the  proof  of  theorem  1,  this  assumption  is  irrelevant.   Indeed,  even 
the   s*   type(s)  can  have  non-stationary  payoffs,  provided  that  playing   s 
is  strictly  dominant  for  the  whole  (infinite-horizon)  game. 

Rather  more  strongly,  even  the  unperturbed  game  need  not  be  stationary. 
In  a  non-stationary  game  we  may  define  the  Stackelberg  payoff  to  be  the 
greatest  average  present  value  obtainable  when  the  short-run  player  plays  a 
passive  best  response  in  every  period.   The  nature  of  the  best  response  may, 
however,  depend  on  either  time  or  history.   The  argument  is  similar  to  that 
above,  except  that  now  the  critical  probability  -n      will  depend  on  time  and 
history.   Provided  only  that  it  is  bounded  uniformly  away  from  one,  that  is, 
7T  (h  )  <  rr  <  1   for  all   t,  and  h^,   the  proof  remains  valid.   An  applica- 
tion  along  these  lines  may  be  found  in  example  1  below. 

4 .   EXAMPLES 

We  now  present  some  examples  to  illustrate  the  power  and  the 
limitations  of  our  result.   Example  1  uses  some  variants  of  the  chain-store 
game  to  illustrate  the  advantage  of  our  technique  of  proof:   we  obtain 
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asymptotic  results  without  the  need  to  explicitly  solve  for  the  sequential 
equilibria.   Example  2  shows  that  the  equilibria  need  not  be  unique. 
Example  3  raises  a  deeper  concern:   our  result  in  general  becomes  much 
weaker  if  the  stage  game  is  not  a  simultaneous  move.   This  may  seem  surpris- 
ing, because  the  chain-store  game  considered  by  Kreps-Wilson  [1982]  and 
Milstrom-Roberts  [1982]  has  sequential  moves  and  not  simultaneous  ones.   As 
we  show  in  section  five,  their  positive  results  are  due  to  the  special 
nature  of  the  payoffs  that  they  considered. 

Example  1:   Consider  the  following  version  of  Selten's  [1977]  chain-store 
game.   Each  period,  a  short-run  entrant  decides  whether  or  not  to  enter  and 
the  long-run  incumbent  chooses  whether  to  fight  or  to  acquiesce.   For 
conformity  with  the  assumptions  of  theorem  1,  we  assume  that  these  choices 
are  made  simultaneously,  and  that  at  the  end  of  the  period  the  incumbent's 
choice  is  revealed,  whether  or  not  entry  in  fact  occurred.   The  entrants 
differ  in  two  ways.   First,  each  period's  entrant  is  either  "strong"  or 
"weak".   Strong  entrants  always  enter,  weak  ones  have  payoffs  described 
below.   Each  period's  entrant  is  weak  with  probability  p,   independent  of 
the  others,  and  being  strong  or  weak  is  private  information.   Second,  each 
entrant  is  one  of  three  "sizes,"  big,  medium,  or  small.   Each  entrant  has 
probability   z    of  being  big,   z    of  being  medium-sized,  and  z    of  being 
small,  and  the  entrants'  sizes  are  public  information.   To  preserve  a 
stationary  structure,  we  imagine  that  the  incumbent  learns  the  period  t 
entrant's  size  at  the  start  of  period  t.   Each  weak  entrant  receives  one  if 
it  stays  out,  zero  if  there  is  a  fight,  and  two  if  it  enters  and  the  incum- 
bent acquiesces.   Thus  weak  entrants  enter  if  the  probability  that  they  will 
be  fought  is  less  than  one  half.   The  incumbent  receives  two  if  it 
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acquiesces  and  four  if  no  entry  occurs.   The  incumbent  receives   3c/2   if  it 
has  to  fight  a  small  entrant,   c   if  it  has  to  fight  a  medium  one,  and  zero 
if  it  has  to  fight  a  large  one.   Thus  in  the  unperturbed  game,  there  is  an 
equilibrium  in  which  all  entrants  enter  and  the  incumbent  acquiesces  to  all 
entry.   The  previous  papers  on  the  chain-store  game  had  only  one  size  of 
entrant,  and  so  the  possible  "reputations"  the  incumbent  would  want  (that 
is,  the  possible  Stackelberg  actions)  to  establish  is  for  always  fighting  or 
for  always  acquiescing. 

To  find  the  Stackelberg  strategy  here,  we  compute  the  difference  in 
payoffs  between  fighting  and  acquiescing,  given  that  the  entrants  play  their 
best  response: 

entrant  size  gain  to  fighting 

big  Ap-2 

medium  (A-c)p  -  (2-c) 

small  [(8-3c)p  -  (4-3c)]/2 

If,  for  example,   c-1   and   1/3  <  p  <  1/2,   the  Stackelberg  strategy  is  to 
fight  the  small  and  medium-sized  entrants,  and  acquiesce  to  the  large  ones. 
If  the  only  types  of  the  incumbent  are  the  original  one  u>        and  a  type  u> 
that  will  always  fight,  then  this  reputation  need  not  be  credible:   the 
first  time  the  incumbent  concedes  to  a  large  entrant  he  reveals  that  he  is 
not  type   u>  .   But  if  there  is  a  non-zero  prior  probability  that  the  incum- 
bent is  a  type  that  acquiesces  only  to  large  entrants,  then  the  incumbent 
can  develop  a  reputation  for  playing  in  this  way. 

We  can  modify  example  1  in  several  ways.   First,  let  us  consider  how 
our  result  extends  to  nonstationary  environments.   Imagine  that  there  are 
only  medium  size  firms  and  that  p  -  0.01   in  odd-numbered  periods,  and 
p  -  0.60   in  even-numbered  periods.   Player  one's  constant  strategy  is  then 
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to  always  acquiesce,  because  the  cost   (-0.97)   of  fighting  in  the  odd 
periods  outweighs  the  potential  gain   (0.80)   from  entry  deterrence  in  the 
even  ones.   However,  if  we  allow  player  one  the  opportunity  to  maintain  a 
reputation  for  nonstationary  play,  he  can  do  much  better.   If  the  prior 
distribution  on  Q  assigns  positive  probability  to  player  one  fighting  in 
even  periods  and  acquiescing  in  the  odd  ones,  the  proof  of  theorem  1  extends 
in  the  obvious  way. 

Next,  imagine  that  the  incumbent  plays  the  chain- store  game  against  two 
simultaneous  sequences  of  entrants:   each  period  the  incumbent  faces  one 
entrant  in  market  A  and  another  in  market  B.   Each  entrant's  payoff 
depends  only  on  play  in  its  own  market,  but  all  entrants  observe  previous 
play  in  both  markets,  so  that  the  markets  are  "informationally  linked"  in 
the  sense  of  Fudenberg-Kreps  [1987].   Entrants  in  market   A   are  "weak"  with 
probability   P  ,   each  independent  of  the  others ,  while  entrants  in  market 

A 

B   are  strong  with  probability   P  .   Fudenberg-Kreps,  in  a  similar  but  more 

B 

complex  setting,  assume  that  the  entrants  believe  that  the  incumbent  either 
has  the  payoffs  of  the  unperturbed  version  of  example  1  or  will  fight  all 
comers ,  so  that  once  the  incumbent  quits  in  one  market  he  is  revealed  as 
weak  in  both  of  them.   However,  if  the  perturbed  game  puts  positive  weight 
on  the  incumbent  only  fighting  in  one  of  the  market,  he  is  free  to  develop 
that  reputation. 


Example  2 :   Consider  the   2x2  game. 


L 

R 

1,1 

0,0 

0,0 

10,10 



Figure  2 
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If  p(w— u   )   is  near  to  one,  then,  regardless  of  6,      the  game  G(S,n)      has 
several  sequential  equilibria,  all  of  which  satisfy  our  bound.   For 
concreteness ,  suppose  that   0  <  e    <  1/11,   and  that  \i      puts  weight   1-e   on 
u>  ,   weight   t   on   s*   (the  player  who  always  plays   D) ,   and  no  weight  on 
any  other  types.   One  equilibrium  of  this  game  has  the  short-run  players 
playing  R  regardless  of  history,  and  both  types  of  player  one  playing  D 
regardless  of  history.   Another  equilibrium  has  the  first  period's  player 
two  playing  L,   and  type  u        player  one  playing  U   in  the  first  period. 
Play  from  the  second  period  on  matches  that  in  the  first  equilibrium. 
Neither  player  one  nor  the  first  period's  player  two  has  an  incentive  to 
deviate,  as  the  first-period  actions  are  a  static  Nash  equilibrium,  and  not 
deviating  gives  player  one  his  highest  possible  payoff  from  period  two  on. 
There  is  no  point  in  player  one  initially  imitating  the   s*   type,  since  he 
will  do  just  as  well  beginning  next  period  anyway. 

Yet  a  third  equilibrium  has  type  u        reverting  to   L  and  player  two 

to   D   after  period  T.   This  is  an  equilibrium  provided  T   is  large  enough 

T+l 

and  S      small  enough  that  6        /(1-<S)  <  1/9.   In  this  case,  the  loss  to 

player  one  by  playing  U   in  period  one  (of  one)  exceeds  the  potential  gain 

T+l 
from  convincing  two  that   u>  -  a*   (of  95    /(1-fi) .   Moreover,  once  one  has 

revealed  in  period  one  (by  playing  U)  that  he  is  not  type   s*   he  cannot 

later  build  a  reputation  for  being  this  type. 

Example  3 :   This  example  has  the  same  extensive  form  as  the  sequential-move 
version  of  the  chain  store  game,  but  different  payoffs.   Player  two  begins 
by  choosing  whether  or  not  to  purchase  a  good  from  player  one.   If  he 
chooses,  both  players  receive  zero.   If  he  buys  player  one  can  produce  high 
quality  or  low.   High  quality  gives  each  player  one,  while  low  quality  gives 
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player  one  a  payoff  of  three  and  player  two  a  payoff  of  zero  (see  Figure  3) . 
The  Stackelberg  outcome  here  is  for  player  one  to  "promise"  to  choose  high 
quality,  so  that  all  the  customers  will  come  in.   Thus  if  theorem  1  extended 
to  this  game  it  would  say  that  if  there  is  a  positive  prior  probability  /j* 
that  player  one  is  a  type  that  always  produces  high  quality,  then  if  8      is 
near  enough  to  unity,  player  one's  payoff  must  be  very  close  to  one  in  any 
Nash  equilibrium.   However,  this  extension  is  clearly  false.   Take  p*   -  .01 
and   /i(u)n)  —  .99,   and  specify  that  the  "sane"  type   u>    always  produces  low 
quality,  so  the  player  two's  never  buy.   Given  their  pessimistic  beliefs, 
the  player  twos  are  correct  to  not  buy  and  so  player  one  does  not  have  the 
opportunity  to  demonstrate  that  he  will  produce  high  quality.   If  the  player 
two's  were  a  single  long-run  player,  they  might  be  tempted  to  purchase  once 
or  twice  to  gather  information,  but  myopic  player  twos  will  not  make  this 
investment.   Thus  we  see  that  for  general  stage  games  it  is  not  true  that 
player  one  can  ensure  almost  his  Stackelberg  payoff. 

We  have  two  responses  to  the  problem  posed  by  example  3.   The  first  is 
to  follow  Fudenberg-Maskin  and  Fudenberg-Kreps-Levine  and  examine  perturba- 
tions with  the  property  that  every  information  set  of  the  stage  game  is 
reached  with  positive  probability.   (See  Fudenberg-Kreps-Levine  for  an 
explanation  of  the  perturbations  involved.)   In  this  case  once  again  we  know 
that  by  playing  his  Stackelberg  action  s*   in  every  period,  player  one  can 
eventually  force  play  to  the  Stackelberg  outcome. 

Our  second  response  is  developed  in  the  next  section,  which  gives  a 
lower  bound  on  player  one's  payoff  that  holds  for  general  games. 

Before  developing  that  argument,  let  us  explain  why  the  problem  raised 
in  example  3  does  not  arise  in  the  chain  store  paradox  (Figure  4).   There, 
the  one  action  the  entrant  could  take  that  "hid"  the  incumbent's  strategy 
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Figure  3:   A  Quality  Game 


Figure  4:   The  Classical  Chain  Store  Paradox 
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was  precisely  the  action  corresponding  to  the  Stackelberg  outcome:   whenever 
the  agent  did  not  play  like  a  Stackelberg  follower,  the  incumbent  had  a 
method  of  demonstrating  that  the  entrant's  play  was  "mistaken".   The  proof 
of  theorem  1  invoked  Bayesian  inference  only  in  those  periods  where   s 
does  not  belong  to   B(s*);   whether  the  incumbent's  strategy  is  revealed  in 
periods  where   s„  G  B(s*)   is  irrelevant. 

5.   GENERAL  FINITE  STATE  GAMES 

This  section  develops  two  extensions  of  the  basic  argument  of  theorem 
1.   Section  5a  treats  the  case  with  several  interacting  short-run  players  in 
each  period,  and  section  5b  handles  general  but  finite  two-player  stage 
games.   We  defer  the  technical  complications  posed  by  uncountably  many 
actions  and  types  to  section  six. 

5a.   Several  Short -Run  Plavers 

Imagine  now  that  the  stage  game  is  a  finite  n-player  simultaneous  move 
game 

g:  S1  x  S2  x  ...  x  Sn  -  IRn, 

with  player  one  the  only  long-run  player.   Since  each  of  the  short-run 
players  must  play  a  short-run  best  response,  in  equilibrium  each  period's 
outcome  must  lie  in  the  best  response  sets  of  all  of  the  short-run  players, 
that  is,  it  must  be  a  Nash  equilibrium  in  the   (n-1)   player  game  induced  by 
fixing  a  (possibly  mixed)  action  by  player  one.   Let  B:  S.  -»  S_  x  ...  x  2 
be  this  Nash  correspondence.   Note  that  this  definition  of   B(s  )   agrees 
with  our  previous  one  for  the  case  of  a  single  short -run  player.   The 
corresponding  definition  of  g*   is  thus 


g*  =  max   min     g.  (s.  ,a ^) 


s1  <7_1eB(s1) 
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As  before,  let   s*  be  a  Stackelberg  action  for  player  one,  that  is,  an 
action  that  attains   g* 

This  situation  is  much  the  same  as  with  a  single  short-run  player,  and 
as  one  would  expect,  the  approach  of  theorem  1  can  be  readily  extended. 
There  is  only  one  minor  complication:   in  the  proof  of  theorem  1  we  argued 
that  since  the  set   B(s*)   contained  all  the  best  responses  to   s*,   then 
there  was  a  probability  -n   <   1   such  that  if  player  one  was  expected  to  play 
s*  with  probability  exceeding  jt,   player  two  would  choose  an  action  in 
B(s*).   With  several  short-run  players,  the  Nash  correspondence   B(«)   need 
not  be  constant  in  the  neighborhood  of   s*   but  it  is  still  upper  hemi- 
continuous .   Thus,  when  player  one  plays   s*   and  his  opponent  plays  a  Nash 
equilibrium  for  some  a.      near  to   s*    the  lowest  player  one's  payoff  can 
be  approximately   g* 

As  before ,  let   w   be  the  type  whose  payoffs  are  as  in  the  unperturbed 
game,   and   s*  be  the  event  that  player  one  has   "s*   forever"  as  a  best 
strategy. 


Theorem  2:  Fix  a  game  G(<5)  ,  with  several  short-run  players  and  consider 
a  perturbed  version  G(5,/0.  Assume  /j(«J  >  0,  and  that  p(s*)  «a  fi*  >  0 . 
Then  for  all   e  >  0,   there  is  a  £  <  1   such  that  for  all  8   G  (i.,1) 

V.{S,fi,u>0)    >    (l-e)g*  +  «  min  g,. 
Moreover,   £  depends  on  ^   only  through  p*. 

Proof:   For  any   jr  £  (0,1]   let  B(tt,s*)   be  the  set  of  all  mutual  best 
responses  by  the  short -run  players  to  each  other  and  to  any  strategy  for 
player  one  that  puts  probability  at  least   tt   on   s*: 
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B(w,  s*)  -      U      B(t71), 

CTl'al(Sl'>>7r 


and  let 

d(ir)  -      inf     g  (s*  a      ) 
a^BCTr.s*) 

be  the  function  that  bounds  how  low  player  one's  payoff  can  be  when  he  plays 
s*  and  the  short-run  players  choose  actions  in  B(tt,s*).   Clearly 
d(l)  -  g*;   and  d(*r)   is  non-decreasing.   We  further  claim  that   d(?r)   is 
continuous  in  the  neighborhood  of  7r— 1.   To  see  this,  suppose  to  the  con- 
trary that  there  is  an   e  >  0   and  a  sequence  n     •*  1      such  that  for  all  n, 
d(n  )  <  g*  -  e.   Then  there  is  a  sequence  a        €.   B(7r  ,  s*)   such  that 
g..  (s*  a   .)    <  g*  -  e.      Extracting  a  convergent  subsequence  from  the  a   ., 
and  using  the  upper  hemicontinuity  of  B(»),   we  conclude  that  there  is  a 
a        €  B(s*)   with   g.(s*,a  .-)  <  g*  -  e,      which  contradicts  the  definition  of 

1' 

To  prove  the  theorem,  fix  an   e  >  0,   and  choose  it      such  that 


g* 


d(?r)  >  (l-e)g*   As  in  the  proof  of  theorem  1,  if  player  one  chooses  the 
strategy  of  always  playing   s*,   there  is  a  bound  k(^*,7r),   independent  of 
S,      on  how  many  times  the  short-run  players  can  choose  actions  that  are  not 
in  B(?r,l).   Finally,  take  6      large  enough  that  5        exceeds   (1-e).       I 

Our  results  use  the  hypothesis  that  player  one's  opponents  are  short- 
lived only  to  ensure  that  they  always  play  myopically.   Thus,  theorem  2 
extends  to  games  where  a  "large"  player  one  faces  a  continuum  of  "small" 
opponents,  with  the  (non- innocuous)  assumption  that  no  player  can  observe 
the  actions  of  a  set  of  opponents  of  measure  zero.   This  makes  precise  a 
sense  in  which  being  infinitely  larger  than  one's  opponents  is  the  same  as 
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being  infinitely  more  patient  than  they  are. 

The  large-  and  small-players  case  differs  from  the  long-  and  short- 
lived one  in  that  our  results  for  the  latter  hold  for  any  discount  factor, 
while  for  the  former  they  hold  only  in  the  continuum  of  players  limit.   An 
exact  analogy  between  the  cases  would  require  that  there  be  a  bound  on 
player  one's  payoffs  when  he  faces  small  but  not  infinitesimal  opponents, 
but  this  is  not  possible  without  further  assumptions.   The  difficulty  is 
that  when  players  are  small  but  not  infinitesimal,  they  can  have  a  large 
influence  on  equilibrium  play.   This  is  why  the  assumption  that  measure-zero 
deviates  are  ignored  is  not  innocuous.   This  is  discussed  in  Fudenberg- 
Levine  [1987a]. 

5b .   General  Deterministic  Stage  Games 

Here  we  take  up  the  point  raised  by  example  3.   If  the  stage -game  is 
not  simultaneous -move ,  the  long-run  player  may  not  have  the  opportunity  to 
develop  the  reputation  he  would  desire.   For  simplicity  we  return  to  the 
case  of  a  single  short-run  player,  player  two.   Let  the  stage-game  be  a 
finite  extensive  form  of  perfect  recall  without  moves  by  Nature.   As  in 
example  3,  the  play  of  the  stage  game  need  not  reveal  player  one's  choice  of 
normal-form  strategy   s    However,  when  both  players  use  pure  strategies 
the  information  revealed  about  player  one's  play  is  deterministic.   Let 
0(s.,s„)  C  A.   denote  the  strategies   s'   of  player  one  such  that  (s'b) 
leads  to  the  same  terminal  node  as   (s..  ,s„).   We  will  say  that  these 
strategies  are  observationallv  equivalent  meaning  that  the  player  two's  do 
not  observe  player  one's  past  normal-form  strategies  but  only  the  realized 
outcomes.   In  example  3,  player  one's  Stackelberg  action   s*  was  to  produce 
high  quality  but,  given  that  player  two  chooses  not  to  risk  dealing  with 
him,  player  one  had  no  way  of  establishing  a  reputation  for  Stackelberg 
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play.   The  problem  was  that  while  "do  not  buy"  was  not  a  best  response  to 
s*,   do  not  buy  is  a  best  response  to  "low  quality"  and  high  and  low  quality 
are  observationally  equivalent  when  player  two  does  not  purchase. 

This  suggests  the  following  generalization  of  theorem  1:   For  any   s..  , 
always  playing   s..   should  eventually  force  player  two  to  play  a  strategy 
s„   which  is  a  best  response  to  an  s'   in  0(s  ,s  ).   That  is,  for  each   s- 
let  W(s1 )   satisfy  ' 

W(s  )  -  (s  |3a'   with   supp  a'    G  0(s .,s.)   such  that   s„  e  B(a')). 

In  other  words,   W(s  )   is  the  set  of  best  responses  for  player  two,  to 
beliefs  about  player  one's  strategy,  that  are  consistent  with  the  informa- 
tion revealed  when  that  response  is  played.   Then  if  6  is  near  to  one, 
player  one  should  be  able  to  ensure  approximately 


g*  =  max   min    g(s   s2) . 
s1   s2€W(s1) 

As  before,  let   s*  be  a  strategy  that  attains   g*   and  let   s*  be  the 
event  that  player  one's  best  strategy  in  G(6,(i)      is  to  always  play   s*. 
Note  that  if  the  stage-game  is  simultaneous  move,   W(s  )  -  B(s  ) ,   and  the 
definitions  of   g*   and  s*      reduce  to  those  we  gave  earlier. 

Theorem  3 :   Let   g  be  as  described  above.   Assume   M(wn)  >  0   and  that 
n(s*)    =  fi*   >  0.   Then  there  is  a  constant  k(/i*) ,   independent  of  (i,      such 
that 

V^S.fi,^)      >  5k(M*}g*  +  (l-5k(/J,Sr))min  gy 

Before  giving  the  proof,  let  us  observe  that  this  result,  while  not  as 
strong  as  the  assertion  in  theorem  1  that  player  one  can  pick  out  his 
preferred  payoff  in  the  graph  of  B,   does  suffice  to  prove  that  player  one 
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can  develop  a  reputation  for  "toughness"  in  the  sequential-move  version  of 
the  chain  store  game.   Consider  the  extensive  form  in  Figure  4  above.   In 
this  game   B(fight)  -  (out)   and  B(acquiesce)  -  (in).   Also, 
O(fight.out)  -  0(acquiesce ,out)  -  (acquiesce,  fight),   while 
O(fight.in)  -  (fight)  and  0(acquiesce , in)  -  (acquiesce). 

First,  we  argue  that  W(fight)  -  B(fight).   To  see  this  observe  that 
W(fight)  2  B(fight)  -  (out).   Moreover,  "in"  is  not  a  best  response  to 
"fight",  and  "acquiesce"  is  not  observationally  equivalent  to  "fight"  when 
player  two  plays  "in".   Consequently,  no  strategy  placing  positive  weight  on 
"in"  is  in  W(fight) . 

Finally,  since  player  one's  Stackelberg  action  with  observable 
strategies  is  fight,  and  W(fight)  -  B(fight) ,   the  fact  that  only  player 
one's  realized  actions,  and  not  his  strategy,  is  observable  does  not  lower 
our  bound  on  player  one's  payoff. 

Proof  of  Theorem  3:   Once  again  we  fix  an  equilibrium  (a.  ,a„)   and  consider 
the  strategy  for  player  one  of  always  playing   s*.   Let   rr(h  )   be  the 
probability  distribution  over   S.   that  player  two  expects  player  one  to  use 
in  period  t.   (This  is  computed  from  player  two's  initial  beliefs  \i ,   the 
observed  history  h  ,   and  player  one's  equilibrium  strategy  in  the  usual 
way.)   If   s   £  W(s*) ,   then  there  is  a  w(s„)   such  that   s„   is  not  a  best 
response  to  any  a,      with  c   *(s*s„)  >  n.      Let  tt  ■  max  tt(s„).   Each  time 
player  two  plays  an  s„   outside  of  W(s*) ,   the  observed  outcome  will  be 
one  that  had  prior  probability  less  than  n .      Thus,  as  in  the  proof  of 
theorem  1,  the  probability  that  player  one  is  the  type  that  plays   s* 
increases  a  non-negligible  amount.   The  rest  of  the  proof  is  the  same  as 
before.  I 
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6.   GAMES  WITH  A  CONTINUUM  OF  STRATEGIES 

We  turn  attention  now  to  the  case  where  players  have  a  continuum  of 
strategies  in  each  period.   Two  complications  arise  in  the  analysis.   First, 
it  is  no  longer  true  that  merely  because  the  short  run  player  puts  a  large 
probability  weight  on  the  Stackelberg  strategy,  he  must  play  a  best  response 
to  it.   However,  he  must  play  an  e-best  response,  and  this  is  sufficient  for 
our  purposes.   Second,  it  is  not  sensible  to  suppose  that  a  priori    the  short 
run  player  places  positive  weight  on  the  Stackelberg  strategy:   instead  we 
assume  that  all  neighborhoods  of  the  Stackelberg  strategy  have  positive 
probability.   Nash  equilibrium,  then,  does  not  require  the  short  run  player 
to  optimize  against  the  Stackelberg  type,  since  that  type  occurs  with 
probability  zero.   Instead,  we  must  work  with  a  sequence  of  types  that 
converge  to  the  Stackelberg  type. 

We  return  to  the  basic  simultaneous  move  model.   Our  description  of  the 
perturbed  and  unperturbed  game  is  unchanged  with  two  exceptions.   First,   S. 
and   S„   are  now  assumed  to  be  compact  metric  spaces,  rather  than  finite 
sets,  and  0  may  be  an  arbitrary  measure  space.   Second,  the  payoff  maps 
g.  :  S.  x  S_  x  Q  -»  K  and   g  ■  S..  x  S.  -»  IR  are  assumed  to  be  continuous  on 
S.  x  S    The  definition  of  Stackelberg  strategies,  payoffs  and  types 
remains  unchanged. 

In  order  to  deal  with  the  continuum  case,  we  need  to  consider  e-best 
responses  by  player  two.   Define   B  :  Z  -+  2„   to  be  the  correspondence  that 
maps  mixed  strategies  by  player  one  in  the  stage  game   g   to  e-best 
responses  of  player  two.   That  is ,  if  <7„  e   B  (c    )  ,      then 

g„(£7.  ,er  )  +  e  >  g  (a   a')   for  all  c'    G  S- .   Let   d   denote  the  distance 
metric.   Recall  the  Stackelberg  strategy   s*(w)   depends  on  the  type  of 
player  1,   u>  e  Q.      We  define  a  corresponding  version  of  the  Stackelberg 
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payoff 

g*(e,u)  -    min         g  (s   a  ,u>)  . 
a26B£(s*(u.)) 

d(s1,s*(w))<« 

In  other  words,  we  allow  e-best  responses  to  strategies  that  differ  from 
s*(u>)   by  up  to  (.      We  also  emphasize  the  dependence  of   g*   on  w.   If  for 
some   w   there  are  several  Stackelberg  strategies   s*(u>)  ,   each  will 
generally  yield  a  different  function  g*(e,w).   However,  all  version  have 


one  key  feature  in  common. 
Lemma  A:   lime_0  S*(£-w)  "  gj(w) 


n 


Proof:   Fix  u.   If  the  lemma  fails,  there  exists  a  sequence   e   -+  0, 

n     .     ^   n  .  n       n     ,.     ,.     .  n  n.  _      . 

s.  -*  s*   and  t    -best  responses  a        to   s    such  that   lim  g..  (s   a  )  <  g* 

Since   s„   is  compact,   2„   is  weakly  compact,  and  we  may  assume  a     •*   a 

Since   g    is  weakly  continuous   lim  g. (s. ,a„)  -  g-(s*c„)  <  g* .   From  the 


definition  of  g*   it  is  clear  that  c„   cannot  be  a  best  response  to   s*; 

that  is,  there  exists  a*     with  g„(s*o*)   >  g„(s*  er.)  +  e      for  some  c   >   0, 

However,  since   g   is  weakly  continuous,  we  have   g„(s   a*)  -+  g    (s*o*) 

and   g2(s1,£72)  -+  g2(s*,a2).   This  implies 

,  n   . .  •     .  n  n.     ,„     ,  n  n.     n 
S2(sl'a2')    e2(El'C2)  +  £/    g2(£l''72:)  +  e  ' 

contradicting  the  fact  that   g„   is  an  t    -best  response  to   s  I 

To  prove  that  as  S   -*   1 ,   the  long  run  player  can  get  nearly  the 
Stackelberg  payoff,  must  must  describe  our  assumption  on  n,      the  distribu- 
tion over  types.   Consider,  for  each  u,      the  set  of  strategies   s.   with 
d(s..  ,  s*(o>))  <  d.   Corresponding  to  these  are  types   s   who  surely  play  the 
corresponding  strategy.   Define   ^*(d,w)   to  be  the  probability  assigned  to 
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these  types  by   /j.   We  can  prove 

Theorem  5 :   For  a  set  fi_   occurring  with  positive  probability  assume  that 
inf   „  xi*(d,w_)  >  0   for   d  >  0.   Then  for  almost  all   wn  G  n.   and  all 

woGno      °  °    ° 

e  >  0   there  exists  a  £  <  1   such  that  for  all  6   6  (£,1) 

V.CS.M.Wq)  2:  (l-e)g*(«0)  +  £  min  g^. 

* 
Moreover,   £  depends  on  fi     only  through  /i  (•). 

Proof:   The  reason  that  the  theorem  holds  only  for  almost  all  u>n   G  0   is 
that  V  (5,/i,w  ),   a  conditional  expectation,  is  only  defined  and  need  only 
be  chosen  optimally  by  player  one  almost  everywhere.   Fixing  such  an  wn , 
the  proof  is  essentially  the  same  as  that  of  theorem  1.   In  the  proof  of 
that  theorem,  we  made  use  of  the  fact  that  there  is  a  history  h   that 
occurs  with  positive  probability  and  such  that  player  one  has  always  played 
s*   to  conclude  that  player  two  must  respond  optimally  given  the  posterior 
probabilities  based  on  h  .   Choosing  a  small  number   e'   notice  that  the 
probability  of   I   with   d(s. ,s*)  <  e'      is  positive.   It  follows  that  for 
some   s,   with   d(s   s*)  <  e',      player  two  must  respond  optimally  following 
the  history  h    that  results  when   s    is  played  repeatedly.   In 
particular,  since  of  the  types   s'   with   d(s'  s.)  <  e',      it  is  almost 
surely  true  in  equilibrium  that  only  type   s.   will  actually  play   s. ,   we 
can  choose   s.   so  that  after  observing   s   played,  the  type   s.   will  have 
positive  conditional  probability.   In  the  proof  of  theorem  1,  we  also  made 
use  of  the  fact  that  for  some   tt  <  1  player  two  will  play  an  action  in 
B(s*)   whenever   ?r*(h   .  )  >  -n .      The  corresponding  fact  here  is  that  for  each 
e '  ,      there  is  some  -k  <   1   such  that  player  two  will  play  an  action  in 
B  (s  )   whenever  7r*(h    )  >  n.      With  these  adaptions,  the  proof  of  theorc 

€    J.  T         T  - 1 
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1  shows  that 

k  k 

2^(6, n,uQ)   >  (1-6    )g*(e'  ,u>Q)    +  6      min  gl . 

where   k   depends  only  on  e' .      The  theorem  now  follows  from  lemma  4.       I 

Let  us  conclude  by  observing  that  theorem  5  can  be  extended  along  the 
lines  of  Section  5b  to  cover  general  stage  games.   This  simply  involves 
introducing  the  set   W  (s..  )   of  strategies  of  player  two  that  are  a  e-best 
response  to  beliefs  that  are  consistent  with  the  information  revealed  when 
that  response  is  played.   Then  s*(w)   and   g?(w)   are  defined  in  the 
obvious  way,  and  we  proceed  as  above. 

This  extension  allows  us  to  treat  the  sequential  play  of  the  Kreps- 
Wilson  [1982]  two-sided  predation  game,  which  was  analyzed  in  Fudenberg- 
Kreps  [1987].   The  stage  game  is  played  on  the  interval   [0,1],   with  each 
player  choosing  a  time  to  concede  if  the  other  player  was  still  fighting. 
If  the  player  two's  were  unlikely  to  be  "tough",  the  Stackelberg  action  for 
player  one  is  to  commit  to  fight  until  the  finish   (t-1) ,   which  induces  the 
"weak"  player  two's  to  concede  immediately.   This  game  is  not  a  simultaneous 
move:   if  the  first  player  two  concedes  immediately,  the  others  will  not 
learn  how  long  player  one  would  have  been  willing  to  fight  him.   However,  as 
in  the  simple  predation  game  of  example  2,  this  does  not  pose  additional 
complications ,  because  whenever  a  player  two  does  not  play  as  a  Stackelberg 
follower  player  one  will  have  a  chance  to  demonstrate  that  he  is  "tough". 
That  is,   B  (t-=l)  -  W  (t-1).   We  conclude  that  if  player  one  is  patient  he 
can  do  almost  as  well  as  if  he  could  commit  himself  to  never  conceding. 
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